Example for Si active learning
This case is an active learning process for a silicon system. The case is located in pwact/example/si_pwmat/
. It starts by constructing the initial training set using INIT_BULK
. Then, the model is trained using the initial training set, and active learning sampling is performed at temperatures of 300K
, 500K
, and 900K
using perturbed structures generated by INIT_BULK
.
The DFT calculation software used in this case is PWMAT. VASP version input files are provided in pwact/example/si_vasp
, CP2K version in pwact/example/si_cp2k
and DFTB version in pwact/example/si_dftb
.
Please note
that the DFT settings provided in the case are only for program execution process testing and do not guarantee calculation accuracy.
INIT_BULK
Launch Command
pwact init_bulk init_param.json resource.json
INIT_BULK Directory Structure
The directory structure of INIT_BULK is as follows. atom.config
, POSCAR
, resource.json
, init_param.json
, relax_etot.input
, relax_etot1.input
, aimd_etot1.input
, aimd_etot2.input
are input files, collection
is the result summary directory, and pwdata
is the directory where the pre-training data is located.
collection directory
The init_config_0
directory contains the results of atom.config after relaxation, supercell generation, lattice scaling, perturbation, and AIMD.
relax.config
is the structure file obtained by relaxing atom.config
. super_cell.config
is the structure file obtained by generating a supercell from relax.config
. 0.9_scale.config
and 0.95_scale.config
are the structure files obtained by scaling the lattice of super_cell.config
.
The 0.9_scale_pertub
directory includes 30 structures obtained by perturbing the positions of atoms in the 0.9_scale.config
structure.
The pwdata
directory contains the results of extracting trajectories in the pwdata format from the perturbed structures using aimd_etot1.input
. It includes train
and valid
subdirectories for the training and testing sets, respectively.
For the train
(or valid
) directory, atom_type.npy
contains the atomic types in the structure, position.npy
contains the positions of atoms, and energies.npy
, forces.npy
, ei.npy
, virials.npy
contain the total energy, atomic forces in three directions, atomic energies, and virial information of the structure, respectively. ei.npy
and virials.npy
are optional files and are only extracted if the trajectory includes atomic energies and virials.
directory1
example/init_bulk
├──atom.config
├──POSCAR
├──resource.json
├──init_param.json
├──relax_etot.input
├──relax_etot1.input
├──aimd_etot1.input
├──aimd_etot2.input
├──pwdata
└──collection
├──init_config_0
│ ├──super_cell.config
│ ├──0.9_scale.config
│ ├──0.9_scale_pertub
│ │ ├──0.9_scale.config
│ │ ├──0_pertub.config
│ │ ├──1_pertub.config
│ │ ├──2_pertub.config
│ │ ...
│ │ └──30_pertub.config
│ ├──0.95_scale.config
│ ├──0.95_scale_pertub
│ │ ├──0.95_scale.config
│ │ ├──0_pertub.config
│ │ ├──1_pertub.config
│ │ ├──2_pertub.config
│ │ ...
│ │ └──30_pertub.config
│ ├──PWdata
│ │ ├──train
│ │ │ ├──atom_type.npy
│ │ │ ├──energies.npy
│ │ │ ├──image_type.npy
│ │ │ ├──position.npy
│ │ │ ├──ei.npy
│ │ │ ├──forces.npy
│ │ │ ├──lattice.npy
│ │ │ ├──virials.npy
│ │ ├──valid
│ │ │ ├──atom_type.npy
│ │ │ ├──energies.npy
│ │ │ ├──image_type.npy
│ │ │ ├──position.npy
│ │ │ ├──ei.npy
│ │ │ ├──forces.npy
│ │ │ ├──lattice.npy
│ │ │ ├──virials.npy
│ └──valid
├──init_config_1
...
0.95_scale_pertub 0.9_scale_pertub relaxed.config train
├──init_config_2
Active Learning
We perform active learning using pre-training data and perturbed structures from the INIT_BULK case at temperatures of 500K
, 800k
and 1100K
.
To start the process, after executing the init_bulk command, navigate to the pwact/example/si_pwmat/
directory and run the following command:
pwact run param.json resource.json
Active Learning Directory Structure
The directory structure of active learning is as follows.
param.json
and resource.json
are the input control files for active learning, and scf_etot.input
is the input file for self-consistent field (SCF) calculations in active learning.
si.al
is the record file for the active learning process, documenting the executed active learning steps.
iter_result.txt
is a record of active learning that explores and selects structures for annotation in each round. The content is shown in the following example.
iter.0000 Total structures 404 accurate 122 rate 30.20% selected 187 rate 46.29% error 95 rate 23.51%
iter.0001 Total structures 404 accurate 334 rate 82.67% selected 70 rate 17.33% error 0 rate 0%
iter.0000
is the directory for the first iteration of active learning, iter.0001
is for the second iteration, and so on.
train
, explore
, and label
are the directories for the training, exploration, and labeling tasks, respectively, in each iteration of active learning.
train directory
In the train
directory, we adopt a committee query strategy with 4 models. There are 4
training tasks, where the 4
models have identical configurations except for the different initialization values of their network parameters.
The 0-train.job
, 1-train.job
, 2-train.job
, and 3-train.job
are the slurm job scripts for executing the training tasks. After the training tasks are completed, four tag files (0-tag.train.success
, 1-tag.train.success
, 2-tag.train.success
, 3-tag.train.success
) indicating the successful completion of the training tasks are generated. In addition, the four models are saved in the train.000
, train.001
, train.002
, and train.003
directories.
Taking the train.000
directory as an example.
train.json
is the input file for the PWMLFF model training. std_input.json
summarizes the training parameter settings outputted by PWMLFF. The model_record
directory is where the models are saved, dp_model.ckpt
is the model file, and epoch_train.dat
contains the average training errors at each epoch, representing the average error on the validation set after each epoch of training is saved in the epoch_valid.dat
file.
torch_script_module.pt
is the compiled model file after compiling dp_model.ckpt
using the jitscript tool
. It is used as a force field for simulations in lammps.